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Abstract: We develop a method of multifractal analysis of A^-body cosmological simu- 
lations that improves on the customary counts-in-cells method by taking special care of 
the effects of discreteness and large scale homogeneity. The analysis of the Mare-Nostrum 
simulation with our method provides strong evidence of self-similar multifractal distribu- 
tions of dark matter and gas, with a halo mass function that is of Press-Schechter type but 
has a power-law exponent —2, as corresponds to a multifractal. Furthermore, our analysis 
shows that the dark matter and gas distributions are indistinguishable as multifractals. 
To determine if there is any gas biasing, we calculate the cross-correlation coefficient, with 
negative but inconclusive results. Hence, we develop an effective Bayesian analysis con- 
nected with information theory, which clearly demonstrates that the gas is biased in a 
long range of scales, up to the scale of homogeneity. However, entropic measures related 
to the Bayesian analysis show that this gas bias is small (in a precise sense) and is such 
that the fractal singularities of both distributions coincide and are identical. We conclude 
that this common multifractal cosmic web structure is determined by the dynamics and is 
independent of the initial conditions. 
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1. Introduction 

The large scale structure of the Universe can be described as a "cosmic web" formed by 
matter sheets, filaments and nodes. This type of structure was initially proposed in con- 
nection with simplified but insightful models of the cosmic dynamics Q and has been since 
confirmed by galaxy surveys and A^-body cosmological simulations Cosmological sim- 
ulations have been especially helpful in testing models of structure formation. In a sense, 
they have been complementary to observations, since observations are biased towards the 
luminous matter, while simulations have fully considered the evolution of the dark matter, 
which is actually the dominant component. In fact, many simulations only consider dark 
matter, in particular, non-baryonic cold dark matter, whose dynamics is simplest to simu- 
late and gives rise to cosmic structure that is in accord with observations. However, due to 
the advances in parallel computing, the development of efficient codes, and the availability 
of more powerful computers, the scope of A^-body simulations has recently changed: now 
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it is possible to simulate the combined dynamics of the non-baryonic dark matter and the 
baryon gas in large cosmological volumes and with relatively good resolution. 

We analyse here the data output of a recent large cosmological simulation of the 
combined dark matter and gas dynamics, namely, a simulation of the cosmic evolution 
of 1024^ dark-matter particles and an equal number of gas particles carried out by the 
Mare-Nostrum supercomputer in Barcelona. This dataset has already been analysed by 
the researchers in charge of the Mare-Nostrum universe project ^, ^. Here, we are 
interested in a particular aspect of the dark matter and gas distributions: their geometry 
and, specifically, their fractal geometry. 

Fractal geometry Q is the geometry of sets or distributions that have noticeable ge- 
ometrical features on ever decreasing scales. It is related to scale invariance and indeed 
appears in nonlinear dynamical systems in which the dynamics is characterized by the 
absence of reference scales. This is the case of the dynamics of collision- less cold dark 
matter (CDM), only subjected to the gravitational interaction. Therefore, the cosmic 
web produced by this type of dynamics has fine structure and it is, arguably, statistically 
self-similar. We can reasonably assume that the cosmic web is a multifractal attractor of 
the gravitational dynamics. This model is supported by the results of CDM simulations 
0, I, I, 0, |ll|. Although the gas dynamics is more complex (due to the gas pressure, 
etc), the gas takes part in the nonlinear dynamics of structure formation and can also 
have a multifractal attractor. Indeed, scaling laws in the distribution of galaxies have a 
long history, which has been reviewed in Refs. |12, 13, 14]. Therefore, it is interesting to 
compare the scaling laws in the distribution of gas with the scaling laws in the distribution 
of dark matter. 

Fractal models of the cosmic structure can only be valid in a range of scales, whose 
upper cutoff is the scale of homogeneity. Its value has been the subject of considerable 
debates and still is controversial |l4|. In contrast, the lower cutoff to scaling has 
attracted less attention. In fact, the CDM gravitational dynamics does not introduce 
any small reference scale that can play the role of a lower cutoff, but the gas dynamics 
introduces the Jeans length. This length is not a fixed reference scale, for it depends 
on the local thermodynamical parameters. In any event, one should expect that the lower 
cutoff to scaling in the dark matter distribution is smaller than the lower cutoff appropriate 
for the distribution of galaxies. However, the opposite seems to be true if one compares 
galaxy surveys with the results of cosmological simulations, since the latter exhibit reduced 
scaling ranges, even in dark matter only simulations. Peebles has included this problem 
in his list of anomalies in standard cosmology [15|. In his words: "scale-dependent biasing 
seems an awkward way to account for the power-law forms of the low order galaxy position 
correlation functions." 

One can be inclined to place more trust in the scaling range found in galaxy surveys: 
cosmological simulations allow one to obtain better statistics but they are not free of 
systematic errors that affect an important range of the smaller scales. Indeed, it has been 
long known that A'^-body simulations are not fully reliable on scales smaller than the mean 
particle spacing 

iV-V3 m, |l3]. In spite of the ever-growing value of N, the range of scales 
between the scale A^~^/'^ and the homogeneity scale is still rather small. In the Mare- 
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Nostrum universe, this scale range spans a factor of 30 (see Sects. § and P). Our goal is 
to demonstrate multifractality of the dark matter and gas distributions in the valid scale 
range. Furthermore, given that this scale range is small, we devise a method to correct for 
discreteness effects and thus extend the valid range to smaller scales, obtaining a reasonable 
scaling range. We also intend to test if the dark matter and gas distributions constitute 
a unique distribution or to what extent they differ. Hence, we make a model of fractal 
biasing. 

We describe our method of coarse multifractal analysis by counts in cells and define 
the basic objects (halos) in Sect. |2|. In our method, the scale of homogeneity is explicitly 
introduced to calculate the multifractal spectrum (Sect. 2.1). In Sub-sect. we show how 
to obtain the main features of this spectrum and how they are influenced by discreteness 
and large scale homogeneity. In Sect. |3|, we apply our method to the zero-redshift particle 
distributions of the Mare-Nostrum universe: (i) we obtain the halo mass functions and 
discuss its relation to the Press-Schechter mass function in Sect. (ii) we obtain the 
multifractal spectra and discuss their relevance in regard to other geometrical studies of 
the cosmic web in Sect. 3.2; and (iii) we demonstrate scaling and compute sound values 
of the correlation dimensions in Sect. |3.3| . The similarity of the results corresponding to 
the gas and the dark matter suggests that both distributions are identical and shows the 
need of precise statistical methods to discriminate between them (Sect. Q). Since the cross- 
correlations cannot give a definite answer (Sub-sect. we develop an effective Bayesian 
analysis (Sub-sect. 4^) which we apply to various cell distributions (Sub-sect. |4.3| ). This 
analysis connects with the thermodynamic entropy of mixing (Sub-sect. [4.41 ). Therefore, we 
study the application of entropic measures to discriminating between mass distributions, 
and we study the connection of entropies in the continuum limit with the multifractal 
spectrum (Sect. ^). Finally, we discuss our results (Sect. |^. 

A note on notation: we use frequently the asymptotic signs ~ and w; for example, 
f{x) ~ g{x) or f{x) ~ g{x) (often without making explicit the independent variable x). 
The former means that the limit of f{x) / g{x) is finite and non-vanishing when x approaches 
some value (which can be zero or infinity), while the latter means, in addition, that the 
limit is one. We also use the sign ~, which only refers to imprecise numerical values (with 
unspecified errors). 



2. Methods of data analysis 

The Mare-Nostrum cosmological simulation is described by Gottlober et al |p. It assumes 
a spatially flat concordance model with parameters Oa = 0.7, = 0.3, ilbar = 0.045, 
Hubble parameter h = 0.7, and initial spectrum with spectral index n = 1, in a comoving 
cube of 500 h^^ Mpc edges. The Gadget-2 code [18| simulated the evolution of dark 



matter and gas from redshift z = 40 to z = 0. Both dark matter and gas are resolved by 
1024^ particles, respectively, which results in a mass of 8.24 • 10^ Mq per dark-matter 
particle and a mass of 1.45 • 10^ Mq per gas particle. The Gadget-2 code implements 
polytropic (adiabatic) evolution of the gas. It can also include dissipation due to radiation 
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or conduction, but these processes have not been included in the Mare-Nostrum simulation. 
Nevertheless, the code always includes an artificial viscosity to take care of shock waves. 

The Mare-Nostrum universe consists of 135 evenly spaced snapshots. For our statistical 
analysis, we only need the z = snapshot, in which the homogeneity scale is largest and 
the structures are most developed. The large size of a Mare-Nostrum universe snapshot 
makes it unwieldy, so it is convenient (and almost necessary) to analyse it in terms of 
compound structures, namely, halos, rather than analysing the full particle distributions. 
The Mare-Nostrum universe researchers ^, |^] use a friends-of-friends algorithm to define 
halos, and then they study the distribution and features of those halos. However, we 
prefer the method of counts in cells, more suitable for studying the continuum limit and 
the scaling properties of particle distributions. Therefore, our elementary objects (halos) 
are cells with constant size but variable mass. The definition of elementary objects in 
distributions with fine structure (fractals) is arbitrary to a high degree, being actually tied 
to the measurement or analysis technique. The definition of elementary objects by coarse 



graining and, in particular, their definition as cells in a mesh, is very convenient [10|. In 
absence of a reference scale, the appropriate cell size (the coarse-graining scale) is arbitrary, 
and there is no clear distinction between inner and outer structure. However, an A^-point 
fractal sample, as a finite point distribution, has a reference scale, namely, the discreteness 
scale N~^/^, which allows us to properly define the size of elementary objects. 

At any rate, the cell size must be considered a running scale. The use of a running cell 
size is useful, for example, to distinguish the nonlinear scales where structure formation 
takes place from the linear scales where the initial conditions are preserved: as the cell size 
enters in the range of the latter scales, the fluctuations of the counts in cells are reduced to 
small Gaussian fluctuations. This homogeneity scale is actually the only real scale in the 
cosmic CDM dynamics, although it is not a sharp scale and, besides, it grows with time. 

The method of counts in cells is also suitable for comparing the gas distribution with 
the dark matter distribution, by comparing the respective counts, for a given cell size. Of 
course, we must devise methods to provide these comparisons with statistical meaning. 
We defer further description of our methods to Sect. ^. However, we advance that our 
main procedure naturally connects with the description of multifractals in terms of Renyi 
dimensions. 

In summary, our basic assumption is that the Mare-Nostrum particle distributions 
represent continuous mass distributions with fine structure but which are homogeneous 
on the large scales. In particular, we expect continuous distributions of cosmic web type, 
which have various kinds of density singularities produced by gravitational collapse. The 
properties of these singularities can be deduced by suppressing the effects of discreteness. 
We introduce in next sub-section methods of multifractal analysis geared to the relevant 



type of singular distributions. In sub-section 2.2, we study the infiuence of the discreteness 



scale and the homogeneity scale on the features of the coarse multifractal spectrum. 
2.1 Counts in cells and coarse multifractal analysis 

Let us assume that a mesh of cells is placed in the sample region (the simulation cube). In 
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the method of counts in cehs, (fractional) statistical moments are defined as 



n>0 



where the index i refers to non-empty cells, rii is the number of points (particles) in the 
cell i, N = - rii is the total number of points, and N{n) is the number of cells with n 
points.^ The second expression involves a sum over cell populations and it is more useful 
than the sum over individual cells, because the range of n is much smaller (when the cell 
size is small). Mq is the number of non-empty cells and Mi = 1. We understand the latter 
as a mass normalization, namely, the mass in cell i is ni/N and the total mass is one, such 
that the mass distribution can be interpreted as a probability distribution (the physical 
masses of gas or dark-matter particles play no role in the statistical analysis). There is an 
alternate definition of (7- moments: 

N{n) ( n \i Mq 



n>0 ^ ^ 

where V is the cell's volume, N{n)/MQ is the fraction of cells that contain n points, and 
p = n/{NV) is the density in those cells. With this definition, /ig = 1 while /ii is not fixed. 
We notice that the moments with positive integer q {Mq or Hq, q £ N) are sufficient for 
regular distributions, but we cannot impose this restriction here ((? S M). 

In regular distributions, the mass contained in any cell is proportional to its volume 
V, in the continuum limit V ^ 0. Therefore, Mq ~ V'^~^. However, we consider singular 
distributions such that their g-moments are non-trivial power laws of V in the continuum 



limit, namely, distributions such that one can define |19| the exponents 



These distributions are called multifractals.^ Of course, the numerical evaluation of the 
limit in Eq. ( |2.3D is not feasible and one must be satisfied with finding a constant value of 
the quotient for sufficiently small V, that is, in a sufficiently long range of negative values of 
log V (a range of scales). In fact, the exponent is normally defined as the slope of the plot 
of logMq versus logV, and its value is found by numerically fitting that slope, supposing 
that a meaningful fit is possible. 

A multifractal is also characterized by a set of local dimensions: the local dimension 
at one point says how the mass grows from that point outwards. Every set of points with 



""^ Central moments are defined by subtracting from n/N its average. In the strongly nonlinear regime, 
central moments are less convenient. 

^The mathematical definition of a multifractal distribution only requires the existence of r(g), which is 
a mild condition on the type of singularities and does not necessarily imply self-similarity. For example, 
an isolated power-law singularity or a massive particle in a uniform background both give rise to non- 
trivial "bifractal" functions r(g). Nonetheless, physically relevant distributions with non-trivial r(g) usually 
exhibit some kind of self-similarity, albeit in a statistical sense. 
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a given local dimension a constitutes a fractal set with dimension f{a). In terms of T{q), 
the spectrum of local dimensions is given by 

a{q)=T'{q), q€R, (2.4) 

and the spectrum of fractal dimensions f{a) is given by the Legendre transform 

f{a)=qa-r{q). (2.5) 

The spectrum of fractal dimensions is convex upwards and fulfills /(a) < a. The fractal 
dimension /(a) reaches the local dimension a at q = 1 [note that Eq. ( |2.3D gives r(l) = 0]. 
The set of singularities with /(ai) = ai contains the bulk of the mass and is called the 
"mass concentrate." 

In addition to the exact exponent r(g) (^), we define, for a given cell size, the coarse 
exponent 

log(M„/K^'^) . X 

where V is the cell size and Vq is the homogeneity scale, such that the density is homo- 
geneous and Mq « for V > Vq. The coarse exponent depends on both V and Vq, 
but this dependence vanishes V <ti Vq (assuming that the limit y — )• exists). The 



introduction of the homogeneity scale in Eq. ( p.6| ) improves the definition used in Ref. |10| 
for the GIF2 simulation, where no Vq is introduced (equivalent to setting Vq = 1). Given 
that the Mare-Nostrum universe cube has 500 Mpc edges, much longer than the 110 

Mpc edges of the GIF2 simulation cube, it is important now to take the transition to 
homogeneity into account in the definition of the coarse exponent, if we want it to be a 
good approximation of the limit (|2.3| ) for moderately small V. 

The homogeneity scale Vq can be found as the scale of crossover to homogeneity in the 
scaling of statistical moments (Sect. |3.3| ). We can also estimate it as the coarse-graining 
scale such that the mass fluctuations are smaller than, say, 10%; namely, we define it as 
the scale such that fi2 = 1-1. Thus, we find that the scale of homogeneity is about l/16th 
of the edge of the cube, namely, about 30 Mpc. This value is similar to the value of 
the GIF2 homogeneity scale found in Ref. [|l^,^ where it is calculated from the crossover 
in the scaling of moments. 

Besides the multifractal spectrum f{a), it is useful to define the spectrum of Renyi 
dimensions |jl^ 

= (2.7) 

q-1 

They have an information-theoretic meaning, which will be explained in detail in Sect. |5|. 
In particular, the dimension of the mass concentrate ai = /(ai) = Di is also called the en- 
tropy dimension. Dq coincides with the maximum value of /(a) and with the box-counting 
dimension of the distribution's support, while D2 = t(2) is the correlation dimension. In 
the homogeneous regime, Mg ~ yj-i ^nd Dg = 3 for any q. In a uniform fractal (a 
unifractal or monofractal) Dg is also constant but smaller than three. In general, Dg is a 
non-increasing function of q. 

^The value found in Ref. ro ~ lAh'^ Mpc, is roughly equivalent to half the edge of the cube such 
that fj.2 < l-l- 
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2.2 Features of the coarse multifractal spectrum 

Here, we examine the features of the multifractal spectrum obtained from the coarse ex- 
ponent defined by Eq. (|2.6| ). 

In a multifractal, the cell size V is, of course, irrelevant, as long as V is sufficiently 
smaller than the homogeneity scale Vq. However, the intrinsic discreteness of a multifractal 
sample (a finite point distribution) gives rise to another scale, namely, the size of the cell 
such that there is one point per cell on average {V = N^^). This scale represents the 
minimal scale at which the distribution can be consistently considered continuous. In the 
initial stages of an A^-body simulation, when there are only very small deviations from 
the one-particle-per-cell average, it is obvious that it makes no sense to consider smaller 
scales. Furthermore, the dynamics of gravitational collapse is deeply distorted on volumes 
V < N^'^, so the resulting particle clusters do not represent the structures that result 



from the collapse of a continuous medium [16, 17]. As a coarse-graining scale, the volume 



V = N~'^ produces the largest variety of masses of coarse-grained objects in A^-body 
cosmological simulations Thus, this cell size provides us with a sort of master cell 

distribution that characterizes the multifractal sample. Whenever we mention halos, we 
refer to non-empty master cells, preferably with a considerable number of particles. Since 
the number of dark-matter or gas particles in the Mare-Nostrum universe is a perfect cube 
and, indeed, a power of two, the master cell distributions are easily obtained. 

Ref. [|lO| shows that the mass function of halos in the GIF2 simulation follows the 
power law N{m) ~ except at the large mass end, where it decays faster. This power 
law derives from an approximation of the multifractal spectrum, namely, /(a) ~ a, and 
therefore represents the mass concentrate of the multifractal. In contrast, the master cell 
distribution contains no information of the matter distribution in voids (zones with a > 3), 
because they are empty |l^]. Hence, a part of the multifractal spectrum is missing even 
at this scale. As V shrinks, the multifractal spectrum is reduced further. 

The length scale that corresponds to V = in the Mare-Nostrum simulation, 

namely, I = N''^/^ = 2"!°, is only a factor 2^ = 64 smaller than Iq = Vq^^ = 2'^. This 
is the largest scaling range that could be attainable in principle, despite the large number 
of particles. In fact, close to the large scale end, at Iq = 2~^, the coarse multifractal 
spectrum is influenced by homogeneity, whereas close to the opposite end it is influenced 
by discreteness. Surely, the best estimation of the real spectrum is to be found somewhere 
in between. Let us study in detail the change of the features of the coarse multifractal 
spectrum with scale. 

For a given coarse-graining scale, we calculate with Eqs. (^]^) and ( |2.6D the exponent 
r(g), and hence we calculate the coarse multifractal spectrum through the Legendre trans- 
form given by ( |2.4| ) and (|2.5| ). The lower end of this spectrum corresponds to the limit 
q — )• oo, that is to say, to the cell(s) with maximum number of particles: 

«min = lim a{q) = 3 . s , (2.8) 

q^oo lOg[V/Voj 

/(amin = -3 — - — . 2.9) 

\og{V Vq) 
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Since amin is the local dimension of the strongest singularity, it changes little with the 
scale, unless WG approcich. h.oinogGii6ity — y ^^)? which, imphes that Qmin 

— )■ 3. Usually, 

max) — 1) namely, there is only one cell with the maximum number of particles. 
Therefore, the choice Vq = 1, which disregards the effect of homogeneity, implies that 
/(ttmin) = 0. However, any Vq < 1, like our present setting Vq = 1/4096, implies that the 
fractal dimension /(amin) is negative! 

Intuitively, negative fractal dimensions seem meaningless, but they often arise in the 
study of random multifractals. The origin of negative fractal dimensions has been discussed 
by Mandelbrot [20|. In brief, the coarse fractal dimension of a set of singularities in a 
random multifractal is proportional to the logarithm of their number, but the expected 
value of this number can be smaller than one. Therefore, sets of singularities with negative 
fractal dimension are probably empty. In our case, by setting Vq to a fraction of the total 
volume, the number of singularities with given a in cubes of size Vq fluctuates and these 
fluctuations are more important for values of a such that there are few singularities with 
that a in the whole simulation box. Thus, it is convenient to "average" over the Fq-^ = 4096 
cubes and consider at once the 4096 singularities with smallest a, truncating the negative 
values of the multifractal spectrum. 

In analogy with the lower end of the spectrum of local dimensions, we can deduce that 
its upper end corresponds to the limit q — >■ — oo, that is, to the set of cells with one particle 
(assuming that V is not so large that there are none). In fact, 

„log(A%) 

«-x = ^lun^ a{q) = -3j^^^^ , (2-10) 

fiami,^) = -3-^ — ■ (2.11) 

log(y/Vb) 

Notice that the master cell distribution has Omax = 3 and, therefore, its spectrum is limited 
to non-void zones (a < 3). The value of Omax increases for cell sizes V > 1/N, as voids 
begin to be sampled. For sufficiently large V, N{1) decreases and approaches I/Vq = 4096 
(only one cell with one particle per each cube of size Vq). Then, /(amax) decreases to zero. 
At this scale, we have the complete (positive) multifractal spectrum in the region a > 3, 
corresponding to voids, and the distribution can be considered continuous over the entire 
range of a [we always discard the negative values of /(a)]. 
The total span of the spectrum is 

_ _ ^ log(nmax/?^min) 

Clmax Ctmin — "J , , , ^ , 

iog(y/vb) 

where rimin = 1 in the relevant range of V. Naturally, the largest span is reached when the 
spectrum is complete in the region a > 3. For the Mare-Nostrum universe, we indeed show 
in Sect. 3.2 that we obtain, by choosing V to be the largest value such that /(omax) > 0, 
the largest span of dimensions a and a good estimate of the full multifractal spectrum. 
For larger values oiV, as the transition to homogeneity begins, nmin grows and approaches 
'^max, with the consequent contraction of the span of the spectrum. 

Scale invariance implies that the multifractal spectra at different coarse-graining scales 
coincide in their respective ranges [amin,«max]) where amin is roughly constant but amax 
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increases with the scale. However, the under-sampling of low density regions that causes 
the truncation of the spectrum at Omax also causes deviations from the true spectrum close 
to 

Q^max- These deviations must be corrected. We see how to do it for the Mare-Nostrum 



multifractal spectra in Sects. 3.2 and 3.3. 

Regarding the master cell distribution and assuming for it the simple mass function 
N{n) = N{\)/n?, we can deduce interesting consequences about the corresponding coarse 
multifractal spectrum. First, we calculate, according to Eq. (|2.1|), 



Mn 



"max OO ^ 2 



n=l 



n=l 



Since this sum is just the number of non-empty cells, we deduce that the fraction of non- 
empty cells containing one particle is N{1)/Mq ^ G/vr^ = 0.61. Thus, the full distribution 
N{n) is determined by just the number of empty cells. Furthermore, from the expression 



^max A r / 1 \ 



(2.12) 



n=l 



and the condition Mi = 1 we can determine fimax- Then, the dimension of the mass 
concentrate ai = /(ai) is 



ai = r'(l) 






-InFoj 


9=1 




In^ 


-InVb 


N 



ln(Wo; 



This dimension is the arithmetic mectn of the general values of O-mhi 

Eq. (^M and 

in Eq. (|2lo| ). 



3. Multifractal analysis of the dark matter and gas distributions 

We now present the results of the multifractal analysis of the Mare-Nostrum universe z = Q 
snapshot, beginning with the halo mass functions given by the counts in the master cell 
distributions (Sect. |3.1[ ). In Sect. we study the multifractal spectra in the range of 
scales covering several powers of two, namely, from / = 2~^^ to I = 2~^. The latter scale 
is the smallest scale (among the powers of two) such that A^(l) < 4096 and therefore the 
spectrum corresponding to voids is complete. On smaller scales, namely, between / = 2~^^ 
and / = 2~^, the high-a ends of the coarse spectra deviate from the true spectrum due to 



under-sampling of the low density regions. In Sect. |3.3| , we propose to correct for under- 
sampling by removing the erroneous ends of the spectrum. Thus, we can demonstrate scale 
invariance in the longest possible range. 
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°g2N dark matter l°g2 




Figure 1: Log- log plots of the number of halos N versus their mass m (number of particles) at 
coarse-graining scale 1024^^, in the case of dark matter (left) and gas (right). 



3.1 Mass functions 



In Fig. [H are plotted the halo mass functions of dark-matter and gas, obtained from the 
counts in the master cell distributions. The mass m is actually defined as the number of 
particles, for simplicity. Both mass functions follow the power law N{m) ~ over a 
considerable range of m: least-squares fits in the log2 m range from to 9 yield slopes 
—2.07, for the dark matter, and —2.12, for the gas. 

There are 156272463 cells with one dark-matter particle and 170546782 cells with one 
gas particle in the master cell distribution. According to Eq. (2.11), the fractal dimensions 



of the sets with amax = 3 are /(amax) = 2.54 and 2.56, for the dark matter and gas, 
respectively. The cell with the largest proportion of dark matter has 20658 dark-matter 
particles and it also has the largest proportion of gas, namely, 19200 gas particles; all the 
particles together form the most massive halo. The corresponding values of amim according 
to Eq. (^^), are 0.61 and 0.63, respectively. However, Eq. (|2.9|) yields negative values of 



/(cemin), which we do not consider. We compute directly from Eqs. (|2.4D , (|2.5[) and 



that the values of a such that /(a) = are 0.91 (dark matter) and 0.95 (gas). 

We have seen in the preceding section that the value of ai corresponding to the master 
cell distribution can be estimated as the arithmetic mean of amin and Omax = 3. Whether 
we use amin — 0.6 or amin — 0.9, this estimation yields smaller values than the actual 
values, which are 2.22 (dark matter) and 2.29 (gas). On the other hand, the estimation 
"imax = exp[iV/7V(l)], deduced by making Mi = 1 in Eq. ( |2.12| ), yields 967 and 542, 
respectively, well below the real values (see Fig. |^). The problem is that the power law is 
modified at the large mass end, as we can perceive in Fig. |l|. On the one hand, at the large 
mass end, the values of N{m) are so small that there are many values of m for each value 
of N; on the other hand, as a function of the average of the corresponding values of m 
decays faster than a power law. In fact, the above estimated values of m^^x actually mark 
the ends of the power laws, instead of the ends of the large masses. 

We can improve the fit of the mass function by modelling the large mass end of the 
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power law. For this, we can take inspiration from the Press-Schechter mass function, 



N(m) oc — 



n/6-3/2 



exp 



m 



n/3+l' 



(3.1) 



where n > —3 is the spectral index of the initial power spectrum and m* stands for the 
large-mass cutoff. In fact, the agreement between the power-law parts of Eq. ( |3.lD and of 
the found mass function demands n — )• —3. Therefore, we take 



N{m) A^(l)m"^exp 



m 



(3.2) 



where e > is to be fitted, as well as m*. The latter can be deduced from the condition 
Ml = 1; namely, 



Ml 



A^(l) ^ exp[— (m/m*) 



N 



m=l 



N 



dm 



m=l 



m 

exp[— (m/m*)'^ 
m 



N 



Inm* 



It is independent of e, and coincides with the value given by Eq. (2.12) if we identify mmax 
there with m*. This identification is natural, because the exponential form ( p.2| ) is just 
one way of introducing a mass cutoff that is more adequate than the sharp cutoff used 
in Eq. ( 2.12 ). We can see why the above quoted values of mmax, below 1000, actually 
mark the end of the power laws. The new values of mmax are obtained from expression 
(|3.2| ) by requiring iV(mmax) = 1- Thus, this model raises the estimations of mmaxj but the 
new values depend on e. For e = 1, mmax is equal to 2849 (dark matter) or 2022 (gas). 
Naturally, better estimations are obtained by taking smaller e. In fact, the Press-Schechter 
mass function must be substituted by a lognormal mass function [p!o| , in which the power 
{m/m^Y becomes [ln(m/m*)]'^. 

3.2 Multifractal spectra and cosmic web structure 

The coarse multifractal spectrum is easily computed from the counts in cells, through 
Eq. d?!) and Eqs. (U), (|^ and (U). We plot in Fig. | the multifractal spectra of 
the dark-matter and gas distributions at scales from / = 2^^^ up to / = 2^'^. We stop 
at this scale because we already have the full spectrum, and on larger scales it begins 
to show signs of a transition to homogeneity. For comparison, we also plot the spectra 
corresponding to the distributions at / = 2~^, which are homogeneous [we have computed 
them using Eq. ( |2.6[ ) with Vq = 1]. 

The six multifractal spectra at successive scales coincide closely in their respective 
ranges, except near Omax; and the spectra corresponding to the dark matter are almost 
identical to the ones corresponding to the gas (Fig. |2|). In addition, they all are similar to 



the multifractal spectra of the GIF2 simulation obtained in Ref. [10|, although they span a 
slightly larger range of local dimensions. By increasing the reference scale Vq in Eq. (p.6|), 
we observe that the span of a at a given scale shrinks, and thus we deduce that the slightly 
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smaller spans in the GIF2 simulation are due to having set there no homogeneity scale 
(Vo = 1). The universal multifractal spectrum of cosmological distributions that all these 
results suggest is typical of statistically self-similar multifractals. 

The dimension of the mass concentrate in the spectra of Fig. |2| slightly rises as the 
coarse-graining length grows; taking all the spectra into account, we estimate ai ~ 2.4. 
This value agrees with the value obtained from the GIF2 simulation. It is a remarkably 
high value, which makes the mass concentrate relatively homogeneous. It is interesting to 
consider the meaning of this high dimension for a cosmic web structure. This type of struc- 
ture presumably possesses singularities of the three possible kinds, namely, singular points, 
curves and surfaces, called nodes, filaments and sheets, respectively. At first sight, the 
high value of Di = ai may suggest that the mass concentrates in the highest dimensional 
structures, namely, sheets (Zel'dovich's "pancakes"). However, self-similar distributions 
of filaments or even of nodes can also reach fractal dimensions higher than two. There- 
fore, detailed morphological studies are necessary to decide the relative weight of sheets, 
filaments and nodes in the cosmic web. 

Morphological studies of multifractal distributions are by no means easy. In fact, fractal 
dimensions do not reveal whether a distribution consists of points, curves or surfaces. This 
information is given by the topological dimension, whereas the fractal dimension informs 
about the clustering of objects of given topological dimension.^ Unfortunately, topological 
dimensions are very difficult to estimate from finite samples of singular distributions. One 
method of studying the topology of a cosmic web finite sample has been devised by Sheth 
et al [^ ]. Their method is based on a surface modelling algorithm ("SurfGen"). Other 
methods are described by van de Weygaert &: Schaap |22], e.g., the method based on the 
Delaunay tessellation field estimator. Many morphological studies of the cosmic web have 
focused on its voids, for the boundaries of voids define the matter sheets (or vice versa); 
but there is no unique definition of voids in finite samples. Cosmic foams with self-similar 
distributions of voids have relatively simple structures, with well defined distributions of 
sheets, filaments and nodes. Besides, the scaling of voids is easily demonstrated in finite 
samples of these distributions. However, the cosmic web seems to be better described as a 
non-lacunar multifractal with much more complex geometry 

The dimension of the multifractal mass concentrate ai ~ 2.4 that we find differs from 
standard determinations of the fractal dimension of the galaxy distribution, which yield 
values close to two but usually smaller [|^, 14 1. However, this dimension is determined 
from the two-point correlation function and, therefore, it corresponds to the correlation 
dimension r(2) = D2, which must be smaller than ai = Di (in a multifractal). We 
determine D2 in Sect. 3.3. 

*The topological dimension is a topological invariant, unlike the Hausdorff-Besicovitch (fractal) dimen- 
sion. The topological dimension can be defined in several equivalent ways and is always an integer: it is zero 
for a point, one for a curve, two for a surface, etc. The Hausdorff-Besicovitch dimension is bounded below 
by the topological dimension. Actually, Mandelbrot |^] defines a fractal as a set with Hausdorff-Besicovitch 
dimension strictly higher than its topological dimension. Therefore, the degree of fractal clustering is 
measured by the difference between both dimensions. 

^Note that this statement strictly applies to the full matter distribution, whereas the cosmic web of 
galaxies could have a low lacunarity, as discussed in Ref. |ll| . 



- 12 - 




Figure 2: Multifractal spectra at scales 1 = 2 ^^,2 ^^,...,2 '', plotted with solid lines in succe- 
sively lighter tones of grey. The light dashed lines are the spectra of the homogeneous distributions 
at I = 2-3. 



Another interesting dimension is Dq, the box-counting dimension of the distribution's 
support. Since it coincides with the maximum of /(a), Fig. ^ shows that a rehable value 
of Dq can only be obtained from the scale / = 2"^ upwards. This value is 3, confirming 
the conclusion that the cosmic web is a non-lacunar multifractal [11|. Note that having 
Dq = 3 implies that the empty cells that appear in increasing numbers for / < 2"^ are 
actually empty because they belong to under-sampled zones. Moreover, the high-a ends 
of the spectra at scales / < 2"^ are given by scarcely occupied cells and, naturally, deviate 
from the true spectrum (best represented at / = 2^'^); in particular, the maximum of /(a) 
is depressed, creating the false impression of lacunarity. In fact, it is necessary to suppress 
scarcely occupied cells to fully demonstrate scale invariance, as we show next. 



3.3 Scaling of second order moments and correlation dimensions 

The superposition of the coarse spectra at / = 2^^^, . . . , 2^^ in their respective a ranges 
that is shown in Fig. ^ constitutes a proof of multifractality. However, the standard proof 
of scale invariance for multifractals is based on the definition of r-exponents in Eq. (p.3|): 
scale invariance demands the scaling of Mq in a range of cell sizes that is sufficient to 
calculate a meaningful r(q) and, hence, Dq. The exponent r(g) is normally calculated by 
fitting the slope of the plot of logM^ versus log/. We now follow this procedure. 

First of all, we need to select the values of q for which we calculate Mq and also select 
the appropriate range of cell sizes. The available range of q is bound above by the condition 
that f{a) > (non-negative fractal dimensions). This bound can be perceived in Fig. |2|, 
for the slopes of the spectra do not become vertical at their left-hand ends, that is to say, 
the respective values of g = /'(a) are bounded above. The bound depends somewhat on 
the particular spectrum and, in fact, becomes smaller as / grows. An examination of the 
numerical values of q for the spectra plotted in Fig. |2| reveals that the largest integer value 
of q that is common to all the spectra is g = 2. Note that the values of f'{a) differ much 
more than the respective values of f{a). 
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Figure 3: Log-log plots for the correlation dimension D2 of the distributions of dark-matter (left) 
and gas (right), showing the fractal scaling range and the transition to homogeneity. 



The possible values of q must also be bounded below: although most spectra in Fig. 
|2| can be nominally extended to q = f'{a) — )• —00, this extension is inside their unreliable 
high-a ends. For example, it is obvious from Fig. ^ that the spectrum at I = 2~^^ (for 
either dark matter or gas) does not represent well the mass concentrate, corresponding to 
the point of contact with the diagonal. Therefore, that spectrum is not valid even down 
to g = 1. Consequently, when we combine the upper and lower bounds, the only integer 
value allowed is g = 2, so we must restrict ourselves to examining the scaling of M2 and 
calculating the correlation dimension D2 = t(2). Notice that this dimension is of special 
interest, since it is the one that is usually measured in galaxy surveys. 

As regards the range of cell sizes in which to look for scaling, the natural range lies 
between the homogeneity scale / = 2~^ and the discreteness scale I = 2~^^. However, we 
have seen above that the effects of under-sampling can already be perceived at I = and 
become more evident at / = 2^^. On the other hand, the cells that are well populated on 
scales / < 2~^ are surely not affected by under-sampling, as proved by the superposition 
of the spectra along their left-hand sides. A sensible way to avoid the effects of under- 
sampling in the computation of M2 is to suppress for each / the scarcely occupied cells that 
contribute to the deviant piece of the corresponding multifractal spectrum. Thus, we set a 
lower cell-mass cutoff m = mo(V^o)"; where Iq = 2~'^, niQ = NIq = 2^^ and, for each /, we 
choose the value of a that marks the beginning of the deviant spectrum. To be definite, we 
assume that the deviant pieces of the spectra begin at their respective maxima (see Fig. 
|2|). Thus, we can proceed to / < 2"^'', but we stop at / = 2~^^ because lower scales present 
several problems: (i) the spectrum hardly represents the mass concentrate; (ii) the value 
of M2 becomes very sensitive to the precise value of the m-cutoff; (iii) the gas distribution 
begins to noticeably depart from the dark-matter distribution. 

Therefore, we compute M2 from the scale / = 2^^^ upwards, and actually we do not 
stop at / = 2~^ but at / = 2~-^, to study the full transition to homogeneity. The log- log 
plots of M2 versus scale / are displayed in Fig. ^. The dashed straight lines correspond to 
the least-squares fits. The two fits in the fractal ranges between / = 2~^'^ and / = 2~^ yield 
the following dimensions: (i) D2 = 1.255 ±0.012 for the dark-matter; (ii) D2 = 1.30 ±0.02 
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for the gas. The two fits in the homogeneity ranges yield values of D2 very close to three, 
of course. 

In each plot, the scale at which the straight line of the fractal fit meets the straight 
line of the homogeneity fit is a measure of the scale of transition to homogeneity Iq. Thus, 
we deduce that Iq = 2^^, approximately, for both the dark matter and the gas. Notice that 
this measure of the scale of homogeneity yields a smaller value than the one that we have 
been using, Iq = 2~^. In fact, the transition to homogeneity is not very sharp but takes 
place between I = 2"^ and / = 2~^, as Fig. ^ shows. 

The value D2 = 1.30 it 0.02 for the gas is definitely smaller than the galaxy correlation 
dimension D2 = 2.0 it 0.1 obtained by Sylos Labini and Pietronero |]l^ but agrees with 



conventional values of D2 14]. Sylos Labini and Pietronero's value stems from their 



criticism of the treatment of finite size effects and, in particular, from questioning the 
classical value of the scale of homogeneity rg — 5 Mpc. Indeed, they extend the scaling 
range of the correlation function up to 30 Mpc and, consequently, D2 grows. Other 
authors also find -D2 — 2, especially when they use long scale ranges to compute it (see 
Table I in Ref. |13|). In our case, the end of the scaling range at / = 2~^ is quite clear, 
although we define as homogeneity scale = 2"^ (a fit up to this scale would hardly raise 
D2, anyway). The scale / = 2~^ is 7.8 Mpc in physical units. 

To summarize the results of Sect. the tests for scale invariance can be considered 
successful, given the limitations imposed by the data. Of course, the scaling range neces- 
sary to affirm scale invariance is a matter of opinion. A factor of 2~^ = 64 is reasonably 
good. In addition to the extent of a scaling range, one must also consider the quality of 
the corresponding least-squares fit, namely, its standard error. In this regard, the fits for 
the dark-matter and gas distributions are both remarkably good. We refrain from affirm- 
ing that we have proved that these distributions are (samples of) statistically self-similar 
multifractals, but we assert that there is strong evidence of it. Furthermore, there is good 
evidence that the dark-matter and gas distributions are indistinguishable multifractals. 
One could object that the confidence intervals for the D2 do not overlap and that there are 
minute differences between the respective plots in Figs. |l| and |2|. To assess the statistical 
significance of the numerical differences between the distributions of dark-matter or gas 
particles, we carry out next a detailed study. 



4. Relation between the gas and dark-matter distributions 

We see that the multifractal properties of the dark-matter and gas distributions are very 
similar (along a considerable range of scales), which suggests that the distributions could 
actually be identical. In general, one may ask if two finite samples of continuous distri- 
butions can come from the same continuous distribution. In particular, it is possible that 
the differences between the distributions of gas and dark matter particles are only due to 
statistical sample variance, while the continuous gas distribution is unbiased with respect 
to the total mass distribution (dominated by the dark matter). We know that the gas 
dynamics is different from the collisionless dark-matter dynamics, with the likely result of 
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bias, but we need to ascertain the existence of bias from the actual particle distributions 
by means of statistical tests. 

The first test that comes to one's mind is based on the cross-correlation function of gas 
and dark-matter particles, in particular, the cross-correlation coefficient, useful to measure 
the similarity of two distributions. Indeed, this test confirms that both distributions are 



very similar, as we show in sub-section 4.1. However, this test cannot prove that the 



samples actually come from the same continuous distribution. In fact, it is easy to see that 
there is no way to prove it and we must satisfy ourselves with obtaining a probability of 
its being true. Rather, assuming a Bayesian point of view, we can quantify the "degree of 
belief" in the hypothesis that there is a common continuous distribution (sub-section ^^ ). 



The application of this method in sub-sect. 4.3 allows us to confidently conclude that the 



gas distribution is biased on nonlinear scales. Then, we study the nature of that bias in 



sub-section 4.4 



To compare the two distributions at several scales, we use counts in cells, like in the 
multifractal analysis. Thus, we assume that two independent continuous distributions de- 
fine the probabilities of the respective counts in cells of given size. In other words, we 
assume that the dark-matter and gas distributions are both samples of respective multi- 
nomial distributions, each one given by a set of probabilities defined in the cells. The 
cross-correlation can be easily expressed in terms of counts in cells. The Bayesian method 
seeks the probability (degree of belief) that two multinomial samples come from the same 
multinomial distribution \l.2[ 

4.1 Cross-correlations 

Given a mass distribution coarse-grained with volume scale V, its auto-correlation is mea- 
sured by the second order cumulant 



6 = ^y" d^Xi(fx2£.2ixi,X2), 



where ^2 is the two-point correlation function of the fine grain distribution. In the nonlinear 
regime, 

{p? {P? pV 

We can define the cross-correlation coefficient of gas (g) and dark- matter (m) at scale V 
as 

_ Cgm _ {Pg Pm) _ Xyi=l '^g* '^mi 



where the last expression refers to counts in volume-!/ cells and M denotes the number 
of these cells. The cross-correlation coefficient can be viewed as the cosine of the angle 
formed by the two M-dimensional vectors {ngj} and {nmi}- 

Given the cell counts {rigi} and {n^i}, we can compute Cgm at once, but we follow 
instead a more elaborate procedure to discern the infiuence of the cell masses. We first rank 
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Figure 4: Cross-correlation coefficient of gas and dark matter in massive halos, as a function of 
the number of halos, ranked in order of decreasing mass. 



the cells in order of decreasing physical mass, for physical mass determines the importance of 
cells in regard to gravity. Then, we compute the cross-correlation coefficient of the ordered 
cells up to successive rank values. In Fig. we plot the cross-correlation coefficient of the 
gas and dark-matter in massive halos (taken from the master cell distribution), computed 
in that way. This coefficient is stably above 0.99, that is to say, the correlation between 
both distributions is very strong. Moreover, the cross-correlation coefficient increases with 
the coarse-graining scale I. For example, it reaches 0.9999 at / = 2~^. However, we have 
no way of knowing how strong the correlation must be for allowing us to affirm that both 
samples come from the same distribution. 

4.2 Bayesian comparison of multinomial distributions 

Bayes' theory of probability interprets the concept of probability as a measure of a state of 
knowledge. Bayes' theorem tells us how to adjust probabilities in regard to new evidence. 
It writes 



P{H\E) 



P{E\H)P{H) 
P{E) 



where H is a. hypothesis with prior probability P{H), E is an event that provides new 
evidence for H, and P{E\H) is the conditional probability of having E if the hypothesis 
H happens to be true. P{E) is the a priori probability of observing the event E under 
all possible hypotheses. P{H\E) adjusts P{H) and is called the posterior probability of 
H given E. Bayesian analysis is routinely employed for model selection in many scientific 
areas. 

In Bayes' theorem, the hypothesis H can belong to a continuum of possibilities. For 
example, if we are given the results of N trials of a binomial experiment, we can analyse 
the information gained from them on the probability p of "success" ("success" is defined 
arbitrarily as one of the two possible outcomes). This probability is a number < p < 1 
(while the probability of "failure" is 1 — p). For a given value of p, the probability of n 
successes in A'^ trials is given by the binomial distribution 



P{N,n\p) 



n 



p"(l-p) 



N-n 
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Since N and n are given and p is unknown, we can apply Bayes' theorem in the form 



P{N,n\p)P{p) p^{l-p)^-^P{p) 
P{n\Jy ,n) — — 



/q P{N, n\p) Pip) dp /o - ^9)^-" P{p) dp 

It yields the probability of p given the data in terms of the prior probability of p. If no prior 
information about p is available, we must assume that P{p) = 1 (according to the principle 
of insufficient reason). Then, the posterior probability P{p\N,n) is the beta distribution 
with parameters n + 1 and N — n + 1. It is trivial to check that it reaches its maximum 
at p = n/N (mode value) and that its variance is proportional to (for fixed n/N and 
large A^). 

This example is, in fact, relevant to our problem, namely, to estimating the probability 
that the given gas and dark-matter samples belong to the same distribution. If we choose 
one cell, with nm dark-matter particles, say, the probability that the mass fraction in that 
cell is pra is given by the beta distribution with parameters and A'm— nm+1 {Nm being 

the total number of dark-matter particles in the sample). Analogously, the probability of 
a gas mass fraction pg in that cell is given by the beta distribution with parameters rig + 1 
and A'^g — rig + 1. We can obtain the probability of the difference — Pg by taking the 
product P{p^\nra, Nj^)P{pg\ng, Ng), performing the change of the variables pm and pg to 
Pra ~ Pg and (pra + Pg)/2, and integrating over the second variable (within the appropriate 
limits). However, the difference p^a — Pg is a continuous variable and its probability is a 
probability density; therefore, the probability that p^ = pg vanishes. Nevertheless, we 
expect to get some information from the value of the probability density at pm = Pg- Thus, 
we calculate 

1 

P{p\nm, iVm) Piping, Ng) dp = 







B{n^ + rig + 1, iVm - rijn + Ng - Ug + 1) 
B{n^ + 1, iVm - + 1) B{ng + 1, iVg - ng + 1) 



(4.1) 



where B{x,y) = T{x) T{y)/T{x + y) is the Euler beta function. The value of the integral 
is enhanced when the maxima of P{p\n^, N^) and P{p\ng, Ng) coincide, namely, when 
nm/Nm = rig/Ng. For fixed N^ = Ng, the function on the right-hand side of Eq. ( ^?l| ) 
is a symmetric function of {n^,ng}. Therefore, for a fixed value of + ng, it is just a 
symmetric function of the difference — ng and has its maximum when n^ = ng. 

One could criticize the preceding approach for only focusing on the value of the prob- 
ability density at pm = Pg, while values of pm — Pg close to zero might also be relevant. We 
can avoid the problem of having to deal with a continuous probability by singling out the 
value pm = Pg from the outset. Thus, we formulate a Bayesian analysis with this hypothesis 
and the event E = {n^^, N^^, ng, Ng}: 

P{p.r.=Pg\E)- PiPn.=Pg)P{E\p^=Pg) 



P{Pm = Pg) P{E\pm = Pg) + P{Pm / Pg) P{E\pm + Pg) ' 
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Here, P{E\p^ / pg) is just the probability of E given any values of and pg, because 
the event pm = Pg has probability zero; namely, 

PiE\p^^Pg) = f^") (^A C dpn. C dpgp^^{l-p^f-^-^-pl-{l-pgf--^^. 
On the other hand, 



Computing the integrals and substituting, we obtain 

PiPm = Pg) K^m, Nra] ng, Ng) 



P{Pm=Pg\E) 



P{Pm = Pg) b{nin, iVm; rig, Ng) + P{pm / Pg) 



where 



b(n N -n N)- P(^\P-^ = P^) (4 2) 



i?(nm + ng + 1, A'^m - + Ng - Ug + 1) 
B{nra + 1, iVm - + 1) B{ng + 1, iVg - ng + 1) 



(4.3) 



This function of {n^, N^,ng, Ng\ coincides with the value of the probability density of 
Pra — Pg at given by Eq. ([4.1[) . Therefore, this approach is consistent with the preceding 
one: if fc(nm, A'^m; ng, A^g) is large, then P(pm = Pg\E) tends to one, independently of the 
prior probability P{pm — Pg)- However, we have no way of estimating this prior probability. 

The assignment of prior probabilities is a usual problem in Bayesian analyses, to the 
extent that Bayes' theory of probability has been deemed subjective. However, there is no 
subjectivity if we indeed understand Bayes' theory as a way of adjusting probabilities in 
regard to new evidence. The Bayes factor defined in Eq. (|4.2|) is such that 



P{pra=Pg\E) A7 N , , -P(Pm=Pg 

log 7771 / ^1 = log K^m, Nra; Ug, Ng) + log ■ 



P{Pm^Pg\E) — ^^.p^p^^p^y 

Hence, we can endow this equation with an information theory meaning: the prior informa- 
tion about the odds of our hypothesis is updated by the information log b provided by the 
event E = {rim, Nra,ng, Ng}. The prior information is null if P{pra = Pg) = PiPm Pg)-, 
but the information provided by the event is independent of any prior probabilities. The 
information provided by E is positive or negative according to whether the Bayes factor is 
larger or smaller than one. The addition of informations is independent of the (common) 
base of the logarithms, but it is convenient to use base two and measure the information in 
bits. If the Bayes factor is larger than one half and smaller than two, the information pro- 
vided by E is smaller than one bit and can hardly be considered significant. For example, 
with Nm = Ng = 200, logg 6(100, 200; 100, 200) = 3.00 bits, logg 6(100, 200; 80, 200) = 0.11 
bits, and log2 6(100, 200; 70, 200) = —3.61 bits, and only the first case or the last case 
provide evidence for or against p^ = pg, respectively. 

Since we actually divide the sample into many cells, we need to generalize the above 
method of comparing binomial distributions to the case of multinomial distributions. This 
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generalization is straightforward, except that we now have to take care of normahzing the 
P{E\-) such that -^i^l') ~ ^- '^^^ resulting Bayes factor is 



6(nmi, . . . ,nmfc;ngi, . . . ,ngk) = 

Bjnrai + ngi + 1, . . . ,nmfc + rigk + 1) 

B{n^i + 1, . . .,n^k + '^)B{ngi + 1, . . . , Ugfc + 1) (/c - 1)! ' 

where {nmi}i=i and {rigj}^^^ are the vectors denoting the numbers of dark-matter and gas 
particles, respectively, in the k cells, and B{xi, . . . , Xk) = r(xi) • • • T{xk)/T{xi + • • • + Xk) 
is the generalized Euler beta function. We can write this Bayes factor as follows: 



6(nml,•••,f^mfc;^^gl,•••,^^gfc) = 

^n^i+ngA / n^k + ngk\ j^m + k-l)\{Ng + k- 1)! 



nmi J \ n^k J (A^m + N„ + k- 1)1 {k - 1)! ' 



(4.4) 



where = n^i + • • • + nmfc and A'^g = rig i + • • • + rigfc are the total numbers of dark- 
matter and gas particles, respectively (which are equal, in our case). The latter form has 
the advantage of being the product of k binomial numbers, one per cell, times an overall 
factor. Each binomial number expresses the number of ways of dividing the total number of 
particles in the corresponding cell between the respective numbers of gas and dark-matter 
particles. We can associate the (base-two) logarithm of that binomial number with a "cell 
entropy" . This entropy is maximal when the numbers of dark-matter and gas particles in 
the cell are equal and vanishes when there are no particles of one type in the cell. 

Let us take = Ng = N. To compute the Bayes factor, we follow an analogous 
procedure to the one employed to compute the cross-correlation coefficient Cgm- Since 
the above-described Bayesian analysis is valid for any multinomial distribution or, in other 
words, the cells are of logical rather than physical nature, we can group several physical cells 
into one. In particular, we can group the less significant cells, namely, the ones with small 
numbers of particles. A systematic procedure for grouping the cells consists in ordering 
them by decreasing total number of particles and separating the most populated ones to 
take them first into account. Thus, we take the first rank cell and compare it against the 
remainder, using the binomial Bayes factor. The evidence for or against pm = Pg cannot 
be considered definitive yet. Then, we proceed to calculate the Bayes information of the 
two more populated cells plus the "cell" with the remainder, and so onwards. If a definite 
trend is soon established, that is to say, if the absolute value of the Bayes information 
grows steadily, we consider it as a solid evidence for or against the hypothesis, according 
to the sign of log2 b. 

4.3 Bayesian analysis of the distributions at several scales 

Here, we apply the above-explained procedure of systematic multinomial Bayesian analysis 
to some relevant cell distributions. We prefer to rank the cells again in order of decreasing 



physical mass, as in Sect. O, rather than in order of decreasing total number of particles. 

We calculate the Bayes information log2 b (in bits) for the hypothesis pm = Pg, consid- 
ering a growing number of the most massive cells. The result is plotted in Fig. |5|, for the 
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Figure 5: Bayesian evidence (in bits) for the equality of distributions (pm = Pg) derived from 
massive cells at I = 2"-'^° (halos) and at I = (transition to homogeneity). 



two most relevant scales: / = N^^^^ = 2~^^ (corresponding to the master cell distributions) 
and / = (the scale of transition to homogeneity). In the first case, we see that the 1000 
most massive halos already show that the evidence against the hypothesis is overwhelming: 
note that log2 6 reaches —130 Kbits and keeps its downward tendency. The evidence in 
the second case is mixed: it is increasingly negative up to the 500th rank, reaching —3 
Kbits, but there it starts growing and becomes positive from the 1550th rank onwards (the 
1550th cell contains 84678 dark matter particles and 83885 gas particles). Considering 
that the total number of cells is = 2^^ = 32768 and that the corresponding total Bayes 
information is 185.4 Kbits, we could say that the evidence of the hypothesis pm = Pg is 
sufficient. However, the most massive cells clearly distinguish both distributions. 

Proceeding to larger scales, namely, to / = 2~^, the above pattern holds. At / = 2~^, 
the Bayes information has some small fluctuations about zero in the first ranks, staying 
above —97 bits, and then it definitely grows, reaching a total of 28.8 Kbits. In this case, 
the evidence for p^ = pg is solid. Of course, the evidence for p^i = Pg is stronger at larger 
I. 

Regarding the origin of the difference between p^ and pg on small scales, let us focus 
on the master cell distributions. An inspection of the dark-matter and gas particle counts 
in massive halos reveals that these consistently have fewer gas particles than dark-matter 
particles. The smaller average number of gas particles is clearly observed in the respective 
log-log plots of counts in cells ranked by total physical mass, which are shown in Fig. ^. We 
observe in the figure that both distributions approximately follow linear log-log laws (sort of 
Zipf's laws), with common slope, but the line that corresponds to the dark-matter particles 
is definitely above. In other words, the massive halos concentrate less gas, although the 
number of gas particles decreases according to the same pattern that the number of dark- 
matter particles. One can also notice that there are more fluctuations in the number of 
gas particles, due to their smaller physical mass. 

It is useful to express the differences between dark-matter and gas particle counts in 



terms of the cell entropies introduced in Sect. 4.2 after Eq. (4.4), as we do next 
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massive halos 




rank 



Figure 6: Particle counts of dark-matter (upper line) and gas (lower line) in halos ranked in order 
of decreasing mass. 



4.4 Entropic difference between the gas and dark-matter distributions 

In the expression ( [4. 41 ) of the Bayes factor, we can consider that the k cells consist of a small 
number of massive cells h = k — 1 {in order of decreasing mass) and a k-th. "cell" containing 
the remaining particles. Furthermore, we assume that the h massive cells contain together 
a total number of particles that is small in comparison with the total number of particles. 
Recalhng that TVm = A^g = iV = 2^° > 1, we can make a suitable approximation of the 
Bayes information log2 b. Indeed, under the given conditions, the largest contributions to 
the Bayes information come from the cell with the remaining particles and from the overall 
factor in Eq. (|4.4[ ); namely, 

f2N -J2i=i{'nmi + ngi)\ A log2(jTN) . 



and 



where we have used Stirling's approximation. Note that both contributions have a first 
term proportional to N, but these large terms cancel one another. Therefore, 



+ 



N 1 
h log2 - - log2 h\ + OiN-') , (4.5) 

which only grows logarithmically with N. This Bayes information is a sum of individual 
cell contributions plus a global contribution. Each cell contribution is negative, because 
the cell entropy is bounded above by the number of particles in the cell, as is easily proved. 
If each massive cell contribution is larger in absolute value than log2(A^/2) = 29 bits, on 
average, the total information due to the h massive cells plus the remainder is negative. 
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In particular, each massive halo contributes, on average, more than log2(A^/2) = 29 bits 
(in absolute value). This is the reason why the total Bayes information of a considerable 
number of massive halos is negative (as shown in Fig. |5|). For example, the contribution of 
the most massive halo, with Ug = 19200 and = 20658, is loga (19200) " 39858 = -38.5 
bits, larger in absolute value than 29 bits. 

The contribution of a massive cell to the Bayes information can be expressed in a more 
familiar form by using again Stirling's approximation. When nm,ng ^ 1, the cell entropy 
can be written as 

™^ ^ J ^ (nm + rig) log2 (nm + Ug) - Ug log2 Hg - nm log2 (4.6) 
= -(nm + ng)[xglog2 2;g + (l-3;g)log2(l-Xg)], (4.7) 
where we have introduced the fraction of gas particles 



Xg 



(the cell entropy has an analogous expression in terms of the fraction of dark-matter parti- 
cles). In those forms, the cell entropy can be identified with the familiar entropy of mixing 
||23t] . Given Xg, the cell entropy of mixing is proportional to the total number of particles in 
the cell; and so is the cell's contribution to the Bayes information, the proportionality con- 
stant being the entropy of mixing per particle minus one. The maximum entropy of mixing 
per particle is one bit and it corresponds to the most mixed distribution, with Xg = 1/2. 
Naturally, a fully mixed cell makes a vanishing contribution to the Bayes information. 

Regarding the master cell distributions, we observe in Fig. ^that the ratio rigj/nmi for 
massive halos is almost constant on average; in fact, Ugi/n^i ~ 0.81. Hence, Xgi ~ 0.45, 
and the entropy of mixing per particle is almost constant and equal to 

— Xg log2 Xg — (1 — Xg) log2(l — Xg) ~ 0.992. 

Therefore, each halo contribution is roughly proportional to the total number of particles 
in it, with a common proportionality constant, namely, 0.992 — 1 = —0.008. This yields 



about —250 bits for the contribution per halo in Eq. (4^). Thus, the absolute value of 
every massive halo contribution is larger than 29 bits, making the total Bayes information 
in Eq. (^]^) negative and regularly decreasing with the number of halos, as displayed in 
Fig. ^ (left). However, the value of the entropy per particle is very close to one, telling us 
that the distributions are very mixed, even though not completely mixed. 

Note that a constant ratio Ugi/n^i for all the cells would be in contradiction with 
-^g = -^m- Thus, the ratio Ugi/n^i ~ 0.81, for example, must grow eventually, as i runs 
over scarcely occupied cells. Even assuming that the ratio Ugi/n^i stays almost constant as 
the cell mass diminishes, the contribution per cell to the Bayes information is proportional 
to the total number of particles in it and, therefore, it must eventually become smaller 
than 29 bits (in absolute value). For one reason or another, the initial downward trend of 
the Bayes information must cease and turn upwards. This turn is observed in the plot for 
I = 2-5 in Fig. |. 
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4.4.1 Connection with thermodynamics 

In thermodynamics, the entropy of mixing is, of course, only one part of the total entropy. 
When other thermodynamic parameters are equal, the entropy of mixing determines the 
equilibrium configuration to be the most mixed distribution. In our context, the gas 
and the dark matter are not comparable thermodynamically because, in principle, cold 
dark matter does not have temperature or pressure. However, CDM particles have velocity 
dispersion, such that one can assign it a temperature and, hence, a thermodynamic entropy 
(independent of the properties of the gas). This is the dark-matter entropy considered by 
Faltenbacher et al [Q] in their study of the entropy of gas and dark-matter clusters from the 
Mare-Nostrum universe. Once the dark matter is assigned thermal states, it is legitimate 
to compare them with the thermal states of the gas. 

In a mixture of ideal gases, the chemical potential of each gas can be expressed as 

n 

1 23], where n is the number density, C(2^) is an increasing function of T characteristic of 



each gas, and we use units consistent with measuring the entropy in bits. The function 
C{T) is calculated from the possible states of the gas particles (translational and internal 
states); for a monoatomic gas, C(^) oc T^/^. The condition of "chemical" equilibrium of 
gas and dark matter is 

t' ~ 't~ ' 

-'^ g m 

which allows for Tg 7^ Tm. In fact, chemical equilibrium implies 

Cg(r) _ Cm(T) _ Cg(T) _ ng 



Tig Tljn 



m 



and therefore different temperatures for different densities. We have seen above that 
"-g/^-m — 0.81 for massive halos. Hence, assuming that both and Cm correspond to 
monoatomic gases, we deduce that Tg/T^ ~ 0.87. 

The conclusion that the dark matter temperature is higher than the gas temperature 
in massive halos may seem counterintuitive. But note that it relies on the assumption of 
independent local thermodynamical equilibria of dark matter and gas at different but well- 
defined temperatures, with the local temperature of dark matter given by its local velocity 
dispersion. This assumption should imply that the dark matter also has pressure and, 
therefore, its dynamics should be governed by similar equations to the ones that govern 
the gas dynamics. However, the effects of dark-matter pressure are not considered in the 
Mare-Nostrum or other A^-body cosmological simulations. 



5. Entropic comparison of distributions 



In the comparison of the gas and dark-matter distributions, we have found it useful to 
introduce a cell entropy, recognizable as the entropy of mixing. In general, the Boltzmann- 



Gibbs-Shannon (BGS) entropy of a discrete probability distribution {pi}f£i is defined as 



M 



S{{Pi}) = -^Pilog2Pi, (5.1) 



1=1 



and it represents the uncertainty or lack of information of the result of an experiment 
with that probability distribution. Note that we are using now "discrete" in the normal 
sense of the word in probability theory, namely, meaning that there is a list of possible 
events, as opposed to the continuum of possible events in a continuous distribution; but 
the probabilities pi are continuous variables. The entropy has some desirable properties, 
such as the bounds < S{{pi}) < log2 M, and the property of additivity, in particular, 
additivity for independent sets of events |24|. This property and the bounds are shared by 



a uni-parametric class of functions, the Renyi entropies 

5,(te}) = !5i2(££l«!), ,^1. (5.2) 

1 - q 

The value of 5i is obtained as the limit g — )• 1 and it coincides with the standard BGS 
entropy defined by Eq. ( |5.lD . 

We can apply the definition of entropy to a discrete distribution of particles in M 
cells, with occupation numbers {nj}^^ (counts in cells) and hence expected probability 
distribution {pi = ni/N^f^^^. The entropy measures the uncertainty of the cell in which 
an arbitrary particle is located (or a group of q particles, in the case of Sq with g G N). 
In particular, we can interpret Eq. (5.1) as follows. According to Boltzmann, one should 



weight a macroscopical state, given by a set of occupation numbers, with the number of 
microscopical states compatible with it (the Boltzmann weight). Then, the entropy is 
the logarithm of this weight. Since the number of states compatible with the occupation 
numbers {rij}^^ is given by the corresponding multinomial number, the entropy is given 
by the logarithm of that multinomial number, namely, 

\ni--- umJ jr[ 

where we have assumed that nj ^> 1, equivalent to neglecting the effect of particle discrete- 
ness. The entropy per particle S[{pi\) is positive and bounded above by log2 M. If the 
distribution is uniform, the bound is reached; in particular, the bound is log2 M = — log2 V . 
Then, the distribution contains the largest uncertainty or, equivalently, the smallest infor- 
mation. Moreover, all the Renyi entropies reach the same bound. 

Naturally, it is important to know the behaviour of the entropies in the continuum 
limit of the discrete distribution {pi\fLi, as the cell size ^ — >• and M — t- oo (for the 
distribution of N particles in M cells, one must let — )• oo before M — t- oo). Not 
surprisingly, the entropies diverge in the continuum limit: one needs an infinite amount of 
information to locate a point in a continuum. Renyi |24| describes the growth of the Sq 



as the distribution becomes continuous in terms of dimensions; namely, he defines for the 
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continuous distribution the dimensions 



Da = lim 



-\0g2V 



3 5g(te}) 

1 ' 



assuming that the Hmit exits. These Renyi dimensions are standard in multifractal analy- 
sis; they have been already introduced in Sect. |2|, Eq. ( |2.7D , and used in subsequent sections. 
The most important Renyi dimension is Di, which is defined by the divergence of the stan- 
dard BGS entropy and is the dimension of the set of singularities where the probability 
concentrates. Since the full set of Renyi dimensions characterizes the information content 
of the distribution in the continuum limit, we deduce that all the continuous distributions 
with the same spectrum of Renyi dimensions appear equivalent in regard to their infor- 
mation content. In particular, every continuous distribution with Dq = 3, g G M, appears 
equivalent to a homogeneous and uniform distribution, in which the Renyi entropies reach 
their upper bound (note that Dq = 3 is the upper bound to the Renyi dimensions). Indeed, 
only part of the information contained in a continuous distribution is preserved in its Renyi 
dimensions. 

One can further define the information content of a continuous distribution in terms 
of its probability density [^], if this density is well defined. However, we are studying 
distributions with singularities. In a singular distribution, the singularities must be con- 
fined to a set of zero volume, but they can be crucial for determining the distribution 
(for example, consider a distribution concentrated in just one point, namely, a Dirac delta 
distribution). Therefore, let us focus, for the moment, on regular distributions with well- 
defined probability density p{x) everywhere and Dq = 3 for all q.^ The probability in an 
element of volume V is given, as F — t- 0, by p{x)V ^ where x belongs to that element of 
volume (this dependence of probability on volume derives from the local dimension being 
a = 3 everywhere). Therefore, 



where the sum runs over a partition of the total volume in volume-!^ elements (a partition 
in cells, for example). In the limit — ?• 0, we can write the entropy as the sum of a finite 
part and a divergent part, namely. 



In rigorous mathematical terms, the needed regularity condition is absolute continuity with respect 
to the Lebesgue measure, namely, the condition that every set with zero volume (null Lebesgue measure) 
contains no mass. It implies, by the Radon-Nikodym theorem, that the mass distribution is given by the 



singularities, for example, isolated power-law singularities. These singularities are compatible with Di — 3, 
which is the only condition that we actually need in the following. Moreover, there are very mild singularities 
that are compatible with Dq — 3 for all g G R. 



SiiPi}) « -Y,Pi^i)Vlog2[p{xi)V] 






integral of a density that is unique (almost everywhere) [M]. In fact, absolute continuity allows some 



-26- 



Naturally, the divergent part just tells us that Di = 3, whereas the finite part is a non- 
trivial integral of the density. 

The finite part of the total entropy is not defined in an absolute way: for partitions in 
unequal volume elements, when the continuum limit is taken, the logarithm in the integrand 
is replaced with log2[4>{x)p{x)], where (j){x) is a positive function. On the other hand, while 
the total entropy is always positive, its finite part can be negative. For these reasons, it 
is necessary to introduce the relative entropy. Conventionally, the entropy of the density 
p{x) relative to the density q{x) is defined as^ 



S{p\q) = / p{x)d X log2 



where it is understood that p{x) = wherever q[x) = 0. The relative entropy is always 
positive. It is also called the Kullback or Kullback-Leibler divergence, and it is studied 
in detail by Kullback [ p^ ] (note that "divergence" means discrimination measure in the 
statistical context). Therefore, the absolute entropy of a coarse-grained distribution gives 
rise, in the continuum limit, to an absolute part, the dimension, and a relative part, the 
relative entropy.^ Only the latter differentiates regular distributions. Notice that the 
entropy relative to the uniform distribution is simplest but is only defined for distributions 
over a finite volume (in our case, the unit cube).^ 

These results hold for singular multifractal distributions with Di < 3, after the nec- 
essary adaptations. One singular distribution v can be relatively regular, that is to say, it 
can be regular with respect to another singular distribution This essentially means 
that the singularities of v form a subset of the singularities of /i. The entropy of v relative 
to jjL is defined as 

where du{x)/dfi{x) is the density of u with respect to /i at the point x. This relative 
entropy differentiates one multifractal distribution (z/) from another (/i), when the former 
is regular with respect to the latter and, in particular, they have the same dimension Di. 
In fact, S{i'\iJ,) = if and only if = /u. 

The Renyi entropy Sq ( |5.2| ) also gives rise in the continuum limit to a divergent part, 
and hence the dimension Dg, and to a finite part. This finite part motivates the definition 
of the relative Renyi entropy 

' dv{x) ^ ''"^ 



Sq{v\lj) = logs 



dv{x) 



dn{x) 



q^l. 



^Here we incur a slight notational inconsistency, since we have been using q for the parameter in the 

Renyi entropies or dimensions. Hence, we leave it to the reader to discern from the context whether q 

means the probability distributions q{x) or qi or the number q. 

*It is useful (but optional) to also define the relative entropy of discrete distributions 

®The relative entropy with respect to the uniform distribution has been considered as a measure of the 

evolution of inhomogeneity in cosmology by Hosoya, Buchert & Morita [pTf. 

Again, the appropriate mathematical definition of regularity is absolute continuity, now with respect to 

the measure fi (every set with null /i- measure has null ^/-measure). By the Radon-Nikodym theorem, there 

is a density du/dfi, unique except in a set of null /i-measure. 
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However, this relative entropy is less useful than the standard (Kullback-Leibler) relative 
entropy. 

The relative entropy differentiates distributions but has two shortcomings. First, 
S{i'\fi) is only defined when v is ^-regular. Second, the relative entropy does not have 
the necessary properties to qualify as a distance between distributions: it fails to be sym- 
metric or to fulfill the triangle inequality. However, it is possible to define a real distance 
between any two distributions in terms of their entropies. For discrete distributions, Endres 
&: Schindelin [28| define 

Dj,Q = 2S{R) - S{P) - S{Q) 

= V ( Pi log2 + qi log2 ^ I , 

where P = {pi}, Q = {qi} and R = {{pi + qi)/2}. Then, they prove that Dpq is a 
distance. Furthermore, Endres & Schindelin [^8| note that it can be applied to continuous 
distributions. This follows from the alternative expression 

DlQ = S{P\R) + SiQ\R), 

that is to say, from DpQ being a sum of relative entropies, in addition to the fact that any 
two continuous distributions are both regular with respect to their mean. Therefore, Dpq 
is well defined in the continuum limit of P and Q. 

Thus, we can measure the distance between the coarse-grained distributions pi = 
Ugi/N and qi = Urai/N, where ngi,n^i ^ 1, and then we can take the continuum limit. 
The distribution R corresponds to the total particle distribution. The squared distance 
between the coarse distributions is 

1 

DpQ = 2 + — ^ (ng j log2 Ugi + rimi logs "-mi - ("-mi + n-gi) log2(nm j + Ugi)) (5.3) 
i=l 

1 ^ 

= ^ (ragiloggngi + nmilogsnmi + (umi +ngi)[l - log2(nnii + Ugi)]) . (5.4) 

Referring to the expression ( [4.6| ) of the cell entropy, we deduce that, in the sum of terms 

(one per cell) given by Eq. (|5.4D , each term represents the gap between the maximum cell 

entropy of mixing (one bit per particle) and its actual value, just like in the sum of cell 

contributions in the Bayes information (|4.5| ). Naturally, Dpq decreases with mixing and 

vanishes for the most mixed distribution P = Q = R. Conversely, it takes its maximum, 

DpQ = 2, when {rigi} and {n^i} are disjoint, namely, when they are not mixed at all 

[as we deduce from Eq. (^.3|)]. Regarding the continuum limits of P and Q, Endres Sz 

Schindelin's distance is maximal if they are mutually singular, namely, if they concentrate 

in disjoint sets. The continuum limits of disjoint {"g j} and {umi} give rise to two mutually 

singular distributions but the definition encompasses more general cases. 

^^The definition of mutually singular distributions is given by, e.g., Capinski & Kopp A particularly 
clear case of mutually singular distributions occurs when they have disjoint supports, but this is not nec- 
essary: for example, the uniform distributions in the Cantor set and in the unit interval, respectively, are 
mutually singular, although the Cantor set is contained in the unit interval. 
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Let us notice that the above defined statistical distance is consistent with our Bayesian 
analysis but cannot replace it. Firstly, it relies on the approximation ngi,n^i ^ 1, that is 
to say, on neglecting the discreteness efi'ect due to particle counts. In this approximation, 
the entropy of mixing in the form given by Eq. ( |4.6| ) is just the asymptotic form of the cell 
entropies in Eq. ( |4.5| ); but note that the global contribution in Eq. ( |4.5D diverges as N ^ oo. 
Lastly, it is a general fact that a statistical distance cannot provide a sharp criterion to 
decide if two discrete distributions are samples from the same continuous distribution, and 
it is on the same footing as the cross-correlation coefficient in that regard. 

Endres &: Schindelin's distance can be connected with a standard statistical measure 
of discrimination as follows. Let us note that DpQ adopts a simplified form when P and 



Q are close p8[, namely. 



where the last expression refers to Pearson's chi-square test of discrimination, which can 
be considered a particular case of the Endres-Schindelin distance. In our case, 

X =2^ 



1=1 



The chi-square test has the advantage of highlighting that the expected fluctuations of 
— ngil in a common distribution are of the order of {n^i + n-gj)^/^. At any rate, the 
test is based on an approximation of DpQ and neither can it provide a sharp criterion of 
discrimination. 



5.1 Bias as entropic distance 

In cosmology, the bulk of mass belongs to the dark matter, so the distribution of gas (or 
galaxies) is assumed to be "biased" with respect to the total matter distribution, domi- 
nated by the dark matter. Since we normalize to one both the dark matter and the gas 
total masses, both components play a symmetrical role in our statistical analyses. There- 
fore, our measure of bias must be just a measure of discrimination between two probability 
distributions (a "divergence" or distance). There are many such measures, but the notions 
of relative entropy and Endres-Schindelin distance naturally arise in connection with our 
Bayesian analysis. Regarding the Endres-Schindelin distance, mutually singular distribu- 
tions are most distant, namely, at distance \/2. This distance diminishes if the distributions 
concentrate in a common set, but vanishes only when they coincide. The relative entropy 
is not a distance but it is useful as well, because it diverges for mutually singular distribu- 
tions and, therefore, it separates distributions better. In fact, the relative entropy can be 

^^The connection of Pearson's chi-square test with information theory can be obtained directly from the 
relative entropy j26|. However, XpQ is much closer to Endres & Schindelin's distance: it is also a distance 
and, furthermore, XpQ/(21n2) < D%q < Xpq , for any P and Q. 
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symmetrized with respect to the compared distributions, and then it diverges unless they 
are mutually regular. 

The simplest example of comparison of two distributions occurs when they are both 
regular, in particular, when they have everywhere well-defined densities p{x) and q{x). In 
spite of their individual regularity, they are mutually singular if they do not overlap, that 
is to say, if each density is positive only where the other density vanishes, then being at 
Endres-Schindelin distance \/2. As they overlap more and, furthermore, the densities ap- 
proach one another, their Endres-Schindelin distance and their symmetric relative entropy 
tend both to zero. On the other hand, the symmetric relative entropy is finite only if both 
distributions vanish in the same point set (disregarding sets of zero volume, of course). 

Regarding singular distributions, the first condition for two distributions to be at small 
Endres-Schindelin distance is that they have the same Renyi dimensions and, therefore, the 
same multifractal spectrum. However, this condition is far from being sufficient. Indeed, 
the multifractal spectrum only gives the "size" (the dimension) of every set of singularities 
with common strength (local dimension), but tells us nothing about the precise geometry 
(location or shape) of those sets. Like in the case of regular distributions, two distributions 
are at small Endres &: Schindelin's distance if the strength and location of their mass con- 
centrations, in particular, their singularities, essentially coincide. As regards the symmetric 
relative entropy, the singularities must actually coincide for it to be finite. 

It has been remarked above that a statistical distance (or divergence) cannot provide 
a sharp distinguishability criterion. In fact, the distinguishability criterion provided by 
the Bayes factor only makes sense for finite point distributions, namely, for deciding if two 
finite point distributions can be samples from the same multinomial distribution. In this 
regard, the Bayesian comparison of the dark-matter and gas cell distributions in Sect. 4.3 
has clearly ruled out a common multinomial distribution on nonlinear scales. Nevertheless, 
the entropy of mixing per particle is very close to the maximum of one bit; for example, it is 
0.992 bits for massive halos in the master cell distributions. Therefore, the two distributions 
are indeed very mixed (very close). 

Furthermore, the closeness of the gas and dark matter distributions suggests that their 
individual singularities coincide and, therefore, the two distributions are mutually regular. 
In the coarse formalism that we use, the local dimension of cell i is 

_^ log[n,/(Wo)] 

iog(y/vb) ■ 

Therefore, the difference between the strenghs of gas and dark matter singularities is 

_ , log(^gi/^m») 

iog(y/vb) 

We can see that this difference vanishes if Ugi/n^i stays bounded (above and below) while 
the cell volume V shrinks. Although we have found that the ratio Ugi/n^i is not unity in 

i^The symmetric relative entropy S{P\Q) + S{Q\P) is called the Jeffreys divergence J{P,Q) ^ |§. 
Despite being symmetrical, it is not a proper distance, for it still fails to fulfill the triangle inequality. It 
is trivially finite for distributions that are mutually regular, namely, absolutely continuous with respect to 
one another. Kullback fed] always works within an equivalence class of mutually regular distributions. 
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populated cells, its logarithm is small (in absolute value) with respect to — log(y/Vo) at 
the lower end of the multifractal scaling range, thus making Ogi and Omi almost equal. In 
general, if we define a local bias factor as the local relative gas concentration, the condition 
for common gas and dark-matter singularities is mild: the local bias factor must be bounded 
away from zero and infinity. 



6. Discussion and Conclusions 

We have improved the method of coarse multifractal analysis based on counts in cells by 
devising a procedure for extracting from a sample of a distribution the maximal information 
about its multifractal properties. The procedure is based on a clear understanding of the 
role of the upper and lower cutoffs to scaling, which are, respectively, the homogeneity and 
discreteness scales. The homogeneity scale is used in the definition of coarse multifractal 
exponents [Eq. while the discreteness scale is crucial to understand and quantify the 

effects of under-sampling. We have employed our procedure to analyse the gas and dark 
matter distributions in the Mare- Nostrum universe at redshift z = 0. 

The only intrinsic scale present in an iV-body simulation is actually the discreteness 
scale V = (besides the size of the simulation cube, which we take as the reference 

scale). The homogeneity scale is present as well but it is dynamical and grows with time. 
Between these two scales the matter distribution can be considered continuous and repre- 
sentative of the nonlinear dynamics. The discreteness scale V = N^^ defines what we call 
the master cell distribution, which best resolves the overall mass distribution. The mass 
function of objects at this scale (halos) adopts a power-law form with a large-mass cutoff, 
similar to the Press-Schechter mass function. However, its power-law exponent is —2, which 
would correspond to an initial power spectrum with index n = —3 in the Press-Schechter 
theory, whereas the actual value in the Mare-Nostrum universe is n = 1. In conclusion. 



the Mare-Nostrum mass function confirms the form of the mass function found in Ref. |10| 
and its independence of the initial power spectrum. 

Of course, the Press-Schechter theory and the consequent mass function are not appli- 
cable to equal-size objects. However, Vergassola et al [^], in their study of the adhesion 
model (described in Ref. [||), also define coarse-grained objects of equal size and, neverthe- 
less, they find a power-law mass function with exponent depending on the initial spectral 
index and with an exponential large-mass cutoff, like in the Press-Schechter theory. On 



the other hand, Vergassola et al [29| show that the adhesion model gives rise to a multi- 
fractal cosmic- web structure (see also Ref. [^^). In this regard, it is especially interesting 
to compare our results with theirs, and to emphasize that the power-law exponent —2 is 
unrelated to the initial power spectrum, unlike their power-law exponent. The dependence 
of their power-law exponent on the initial power spectrum is surely due to the nature of the 
Zel'dovich approximation, in which the dynamics is trivial before the formation of singu- 
larities. In contrast, the real gravitational dynamics is chaotic. Therefore, the multifractal 
attractor of the real dynamics is independent of the initial conditions and must arise even 
when the initial conditions do not have a scale invariant power spectrum. 
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The mass function power-law exponent —2 is, in fact, naturally associated with the mul- 
tifractal mass concentrate. Furthermore, we find that the precise form of the exponential 
large-mass cutoff suggests that the power law is actually an approximation of a lognormal 
mass function, as expected in a multifractal [lOj and found in the Mare-Nostrum universe 
on larger scales. 

Our first direct test of scale invariance consists in calculating the coarse multifractal 
spectrum in a range of nonlinear scales, namely, from I = 2~^^ up to 2~^. For this, we use 
the improved definition of coarse exponents (|2.6|) , which includes the scale of homogene- 
ity (estimated through the condition 112 = 1.1). This improvement is necessary when the 
scale of homogeneity is considerable smaller than the box size. The resulting multifractal 
spectra (Fig. ^ agree in their respective ranges (except near Omax)- Moreover, the spec- 
tra corresponding to the dark matter and to the gas are almost identical. However, the 
introduction of the scale of homogeneity produces an anomalous extension of the multi- 
fractal spectrum: it gives rise to negative fractal dimensions. They can be understood as 
representing improbable matter fluctuations that can be ignored. 

From the multifractal spectra, we deduce two important dimensions, namely, the di- 
mension of the mass support Dq = 3 and the dimension of the mass concentrate Di ~ 2.4. 
Both dimensions provide information on the type of multifractal cosmic-web structure. The 
former dimension shows that this multifractal is non-lacunar while the latter shows that it 
is not very concentrated. The overall weak concentration indicated by Di ~ 2.4 can be due 
to the dominance of surface singularities ("pancakes") but can also be due to the clustering 
of lower dimensional singularities, namely, filaments or nodes. Cosmic web singularities are 
difficult to define in galaxy or A^-body samples, but can be partially unveiled with appro- 
priate algorithms [^, At any rate, one must notice that a non-lacunar cosmic web 
structure has a very complex geometry [11 1. Of course, this geometry is determined by the 
dynamics of gravitational collapse and, in particular, by its type of anisotropy; but further 
discussion of this question is beyond the scope of this work (the role of anisotropic collapse 
in the formation of the cosmic web is discussed in Ref . |2| , for example) . 

Our study of the multifractal spectra on decreasing scales from / = 2^'^ to 2^^^, 
including the discreteness scale I = N~^/^ = 2~^^, allows us to discern the progressive 
infiuence of discreteness. The most obvious change is, of course, the shrinking range of a, 
namely, the reduction of Omax caused by lack of mass resolution: depleted small cells must 
be empty. Furthermore, the mass distribution is under-sampled in cells with few particles, 
altering the ends of the spectra near Omax- We can measure these deviations, for we can 
compare small scale spectra with the complete spectra at / = 2~^. Actually, the spectra 
are almost complete at / = 2~^. For / > 2"'', there appear early signs of the transition to 
homogeneity. 

It is interesting to connect our results about the influence of discreteness, which only 
concern the statistical properties of the redshift z = distributions, with the studies 
by Kuhlman, Melott & Shandarin [^] and Splinter et al [0] of the dynamical effects of 
discreteness. Those authors conclude that these effects are the more important the less 
converging the particle motion is. Thus, we have, on the one hand, that expanding volume 
elements give rise to voids, with local dimension a > 3, which are only well represented in 
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the multifractal spectra corresponding to scales considerably larger than I = N^^^^. On the 
other hand, collapsing volume elements give rise to mass concentrations with the smaller 
dimension the larger is the number of independent axis along which they collapse. These 
mass concentrations can be well represented in the spectra corresponding to / < N^^^'^. 
For example, isotropic collapse gives rise to the smallest dimension concentrations, which 
are the most robust against the effects of undersampling; and, in fact, the low-a end of the 
multifractal spectrum is essentially correct even for scales / < 2~^^. However, the strong 
singularities with low a do not represent the full cosmic web structure. 

Our second and most direct test of scale invariance is made in the standard way, namely, 
by studying the dependence of the second order moment M2 on the scale I: we calculate 
M2{1) from / = 2^12 a broad range that includes the discreteness and homogeneity 

scales. On the smaller scales, we correct for the effect of discreteness by suppressing under- 
sampled cells, according to the information provided by the already computed spectra. We 
find two well-defined scaling ranges: the fractal range, spanning from I = 2-^2 2-6, and 
the homogeneous range, from / = 2~^ upwards. The transition to homogeneity takes place 
between / = 2~^ and / = 2~^. For definiteness, we choose as homogeneity scale Iq = 2~^, 
which in physical units is 16 Mpc. The fractal correlation dimensions are D2 = 1.26, 
for the dark-matter, and D2 = 1.30, for the gas, in accord with conventional values of the 
galaxy correlation dimension |13, 

To find out if the equivalence of the gas and dark matter distributions goes beyond 
their scaling properties, we have undertaken a detailed statistical study of the relation be- 
tween these distributions. Since we employ the method of counts in cells, we have specified 
two kinds of comparison: (i) the two cell distributions, defined by their respective sets of 
occupation numbers {nj}, are compared as if they were two discrete probability distribu- 
tions with respective probabilities {pi = rii/N}; (ii) the two cell distributions {rimi} and 
{rig j} are compared to decide if it is likely that they are samples from the same multinomial 
distribution (given by some coarse distribution {pi}). The first kind of comparison leads us 
to measures discriminating between discrete probability distributions (and between their 
continuum limits). We have considered firstly the cross-correlation coefficient and lastly 
entropic distances (or "divergences"), actually motivated by our method of deciding if two 
cell distributions are samples of the same multinomial distribution. Since there are many 
(pseudo) distances to discriminate between discrete probability distributions, the compari- 
son based on one of them has no absolute value. However, all the measures that we employ 
to discriminate between the coarse gas and dark matter distributions tell us that they are 
very close. 

To decide if it is likely that the two cell distributions {n^i} and {?^g^} are samples from 
the same multinomial distribution, we develop a Bayesian method of analysis. The two 
distributions are compared by means of the Bayes information about the equality pm = Pg, 
namely, by means of the logarithm of the corresponding Bayes factor ( [4.21 ). The Bayes 
information corresponding to a set of massive cells can be expressed as a sum of negative 
cell terms, proportional to the entropy of mixing per particle minus one, added to a positive 
global term. The application of this formula to the master cell distributions, starting from 
the most massive halos, demonstrates gas biasing. In particular, the gas is less concentrated 
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in massive halos. The bias is attenuated on larger scales but only disappears at I = 2~^, 
namely, at the scale of full homogeneity. Naturally, it is to be expected that there is no 
bias at homogeneity, for it essentially preserves the initial conditions. However, we do not 
have any argument that forbids that the bias vanishes at a smaller scale, so the fact that 
it vanishes only at homogeneity could be coincidental. 

Since the Bayesian analysis can be formulated in terms of the entropy of mixing, we 
have studied in detail the entropic comparison of continuous distributions. We must as- 
sume that the Renyi entropies of the compared distributions have well defined continuum 
limits, which amounts to assuming that the distributions are multifractal (including regu- 
lar distributions with Dq = 3). Thus, the first element of comparison is the spectrum of 
Renyi dimensions or, equivalently, the multifractal spectrum. As regards their multifractal 
spectra, the dark matter and gas distributions in the Mare-Nostrum universe are indistin- 
guishable. However, the multifractal spectrum gives the sizes of the sets of dark-matter 
or gas concentrations (or depletions) with equal strength but is insensitive to the location 
of those sets. In fact, the Renyi dimensions only contain partial information about a con- 
tinuous distribution. In particular, Di represents only one part of its entropy. Another 
part of the entropy is of relational nature and can be expressed as a relative entropy or 
as a statistical entropic distance equal to (the square root of) the neg- entropy of mixing, 
proportional to one minus the entropy of mixing per particle. The high entropy of mixing 
or small entropic distance between the gas and dark-matter distributions is due to the fact 
that their respective singularities actually coincide, namely, the respective singularities at 
the same positions have equal local dimensions. 

The appearance of common singularities in the gas and in the dark matter surely has 
a physical origin, despite the differences between the dynamics of each component. It is 
natural to conjecture that the common multifractal structure is due to the fact that the 
gas and the dark matter are both dominated, on a long range of scales, by the gravita- 
tional interaction, which produces common power-law singularities. The differences in the 
dynamics are the cause of gas biasing but do not interfere with the essential multifractal 
features of the distributions (except on very small scales). In fact, the Mare-Nostrum uni- 
verse is not based on a very realistic model of gas dynamics, insofar as it does not consider 
thermal radiation or conduction. Nevertheless, if the cosmic web singularity structure is 
due to gravity only, the analysis of future simulations will corroborate that the gas biasing 
does not alter that structure. Then, we can speak of a kind of universality: the cosmic 
dynamics has a unique type of cosmic web multifractal attr actor, independent of the initial 
conditions. In particular, the multifractal spectrum obtained here from the Mare-Nostrum 



universe or before from the GIF2 simulation |1C] must be characteristic of the cosmic web. 
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